This folder contains the scripts used to conduct the analysis of the data collected at NOCS Southampton, with a Fidas 200S and 40 PM sensors between July 2020 and July 2021.

Data preparation

The datasets used during this analysis are:

  • data from the Fidas 200S: PM mass concentrations, PM size distribution and weather data.

  • data from the low-cost sensors: PM mass concentrations, PM size distribution, weather data.

Fidas 200S data

The data from the Fidas 200S was extracted from the .promo files using PDAnalyze (proprietary software from Palas Gmbh) at 1min resolution. The text files extracted from the .promo files were parsed and stored at:

  • data\df_pm_2min.rds - PM concentrations

  • data\df_distrib_2min.rds - PM size distribution

  • data\df_weather_2min.rds - weather data

Low-cost sensors data

The data from the sensors was sent to an influxDB, and was saved in data/202007_to_202107_nocs.rds.

The data is then processed by utilities/data_preparation.r and saved in DF_JOINED (see utilities/variables.r for the exact location).

Sensors “PMS-60N1” and “PMS-56N2” have been removed because of the high peaks they report now and again. “SPS-40N2” has been removed because it reports a constant value of zero.

Two time periods have been removed:

  • 15th October between 10:00 and 11:00 for a preliminary incense experiment (not presented in the paper).

  • 3rd November all day for the incense experiments described in the paper.

The sensors were spread across 4 air quality monitors described in (Johnston et al. 2019) called Nocs-1, Nocs-2, Nocs-3, and Nocs-4.

The air quality monitor Nocs-2 stopped working on the 23rd December 2020. This means 5 PMS and 6 SPS to study.

Data exploration

The main results are the delay at 10 second resolution especially with the experiments conducted with the incense peaks and that a temporal resolution of 2 min is preferred given the low concentrations encountered during the study.

Summary of the data from the Fidas 200S

df_fidas <- readRDS("C:/Data/Fidas/promo/df_pm_2min.rds")
df_fidas <- df_fidas[df_fidas$date >= as.POSIXct("2020-07-01 00:00:00",tz="UTC"),]
df_weather <- readRDS("C:/Data/Fidas/promo/df_weather_2min.rds")
df_weather <- df_weather[df_weather$date >= as.POSIXct("2020-07-01 00:00:00",tz="UTC"),]
df_distrib <- readRDS("C:/Data/Fidas/promo/df_distrib_2min.rds")
df_distrib <- df_distrib[df_distrib$date >= as.POSIXct("2020-07-01 00:00:00",tz="UTC"),]



p<-df_fidas %>%
  ggplot(aes(x=date, y= PM2.5))+
  geom_line()

ggplotly(p, dynamicTicks = T)
p<-df_weather %>%
  ggplot(aes(x=date, y = rh))+
  geom_line()
ggplotly(p, dynamicTicks = T)

PM\(_{2.5}\) concentrations

Statistics on the PM\(_{2.5}\) concentrations reported by the Fidas 200S, per month.

df_fidas$month <- cut(df_fidas$date, breaks = "1 month")
df_fidas %>%
  group_by(month) %>%
  summarise(min = min(PM2.5, na.rm = T),
            Quart1 = quantile(PM2.5, probs = c(0.25), na.rm = T),
            Median = median(PM2.5, na.rm = T),
            Quart3 = quantile(PM2.5, probs = c(0.75), na.rm = T),
            max = max(PM2.5, na.rm = T)
            ) 
p<-df_fidas %>%
  ggplot() +
  geom_boxplot(aes(x=month, y=PM2.5),outlier.shape = NA)+
  scale_y_continuous(limits = quantile(df_fidas$PM2.5, c(0.1, 0.9)))
p

#ggplotly(p, dynamicTicks = T)

The concentration levels was higher between February 2021 until end of April 2021 and then comes back to lower levels in May/June 2021. November 2020 and June 2021 registered the highest peaks. August, September, October registered concentrations > 100 \(\mu g.m^{-3}\) .

Relative humidity

Statistics on the relative humidity reported by the Fidas 200S, per month.

df_weather$month <- cut(df_weather$date, breaks = "1 month")
df_weather %>%
  group_by(month) %>%
  summarise(min = min(rh, na.rm = T),
            Quart1 = quantile(rh, probs = c(0.25), na.rm = T),
            Median = median(rh, na.rm = T),
            Quart3 = quantile(rh, probs = c(0.75), na.rm = T),
            max = max(rh, na.rm = T)
            ) 
p<-df_weather %>%
  ggplot() +
  geom_boxplot(aes(x=month, y=rh),outlier.shape = NA)+
  scale_y_continuous(limits = quantile(df_weather$rh, c(0.1, 0.9)))
p

Relative humidity is as expected with a yearly pattern peaking in December during winter.

Temperature

Statistics on the temperature reported by the Fidas 200S, per month.

df_weather$month <- cut(df_weather$date, breaks = "1 month")
df_weather %>%
  group_by(month) %>%
  summarise(min = min(temperature, na.rm = T),
            Quart1 = quantile(temperature, probs = c(0.25), na.rm = T),
            Median = median(temperature, na.rm = T),
            Quart3 = quantile(temperature, probs = c(0.75), na.rm = T),
            max = max(temperature, na.rm = T)
            ) 
p<-df_weather %>%
  ggplot() +
  geom_boxplot(aes(x=month, y=temperature),outlier.shape = NA)+
  scale_y_continuous(limits = quantile(df_weather$temperature, c(0.1, 0.9)))
p

Idem for temperature with a low in December.

Visualisation of the data from the sensors

The goal is to detect anomalies in the data and faulty sensors.

The visualisation is available at Time series sensors.

Data analysis conducted for the study

The following points have been investigated:

  • impact of environmental paramater on the performances of the sensors

  • impact of the time averaging

  • performances of calibration methods

  • performances of calibration scenarios (lenght and duration of the calibration).

Impact of the time averaging on the certification

This is presented in Demonstration of equivalence for daily, hourly and 2min averages

The demonstration of equivalence is calculated for each combination of 40 consecutive days in the data for each sensors.

This script should have run correctly. 25/05/2022

30/05/2022 - yes it did, but it was not possible to get the starting day for the calibration so I run it again, modified this time to be able to know when the sensors obtained less good results. The script has now run.

31/05/2022 The script has to be ran again as I had not removed the day when I did the first experiment with the incense.

22/06/2022 - the plots have been generated, included test of significance.

Test calibration

The calibration is first tested by using the first two weeks of the data and using the next 40 days for verification. This is calculated by test_calibration/test_calibration.r and presented in Test calibration

Note: the first script passed without issue. The script containing the ML methods does not work from GBM RH, check the way I pass “corrected” back to the dataset. 20220622 this has been corrected.

Variation of the calibration through the year

The sensors are calibrated using a range of methods using the first 2 weeks of the data at a 2min resolution. The performances of the calibration is then assessed throughout the year, per week and per month. Plotted by calibration_2weeks_restoftheyear.rmd, with the calculations being done by calibration_2weeks_restoftheyear/calibration_2weeks_restoftheyear.r and aggregated by calibration_2weeks_restoftheyear/calibration_2weeks_restoftheyear_agg.r. (This has not been done yet 23/05/2022)

I am having issues at the moment with the calibrate_sensors function for koehler mass, apparently with the size of the dataset to hold in memory. I could split the test dataset in two and that would not change the results or the script to aggregate the data - 24/05/2022 11:13.

The results will be available in Calibration on 2 first weeks, performances on the year 23/05/2022.

02/06/2022 - Results are good and available, no need to do more on this point.

Robust method selection

Run the first two iterations of the file. Got the following error message:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 7: sensor = "SPS-41N3". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 7: sensor = "SPS-41N3". 

I need to re-run these first two iterations as there may be an issue with some of the sensors.

On run number 3:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
4: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
5: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
6: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
7: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
8: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
9: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
10: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
11: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
12: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
13: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
14: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
15: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
16: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
17: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
18: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
19: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".

On run number 4:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 

On run number 5, I obtained the following error message:

Robust RLM - Huber and RH
Error: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
x 'x' is singular: singular fits are not implemented in 'rlm'
i The error occurred in group 5: sensor = "PMS-86N2".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i the standard deviation is zero
i The warning occurred in group 5: sensor = "PMS-86N2". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i the standard deviation is zero
i The warning occurred in group 5: sensor = "PMS-86N2". 
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i the standard deviation is zero
i The warning occurred in group 5: sensor = "PMS-86N2". 

This happened on the calibration starting on 2020-12-22 which is date_list[149][[1]]. It seems that the “PMS-86N2” stopped working on the 23rd. This means that I should check that I have indeed 14 days of data for the calibration and 40 days of data for the verification. From looking at the time series, it is actually all of the box N2 that stopped working at that date, is that correct? Yes, the SD card seems corrected, the .tbz files are 0kb in size so there is no way to recover it.

How many sensors does that leaves me with?

The 6th iteration yielded the following error:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "SPS-21N4". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "SPS-21N4". 
4: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 

The results are quite different from the thesis. I suspect that this has to do with the rolling average that I used for the thesis but that I do not want to use here (because of Ben Barrat’s comments regarding the fact that the data becomes harder to interpret when using these rolling averages.)

_8 - 02-06-2022 - gave the following warnings:

1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "SPS-21N4". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "SPS-21N4". 
4: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 

02/06/2022 - the script has been re-run for dates_list[38:94] which correspond to the dates containing the 15th October (first incense experiment).

After that script is run, check the results but they should be fine.

03/06/2022 - all run correctly.

Calibration scenarios comparison

2w 8w

Run first 3 iterations of 2w

The second iteration got the following warning message:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 7: sensor = "SPS-41N3". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 7: sensor = "SPS-41N3". 

_4 gave the following warnings:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 7: sensor = "SPS-41N3".
4: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
5: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
6: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
7: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
8: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
9: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
10: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
11: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
12: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
13: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
14: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
15: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
16: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
17: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
18: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
19: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
20: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
21: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "PMS-63N3".
22: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
23: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
24: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 4: sensor = "PMS-75N3".
25: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1".
26: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1".

_5 run without issues

_6 gave the following warnings:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "SPS-21N4". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 2: sensor = "SPS-21N4". 
4: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 

_7 gave the following error:

Error in combn(paste0(unique(df$sensor)), 2) : n < m
In addition: Warning messages:
1: In min.default(numeric(0), na.rm = FALSE) :
  no non-missing arguments to min; returning Inf
2: In min.default(numeric(0), na.rm = FALSE) :
  no non-missing arguments to min; returning Inf
3: In min.default(numeric(0), na.rm = FALSE) :
  no non-missing arguments to min; returning Inf

03/06/2022 - all iterations of 2w8w have been run

1w 8w 1w

2022/06/03 - The initial script run as well as _1, _3, _2, _4 with no issues. , and _5 are still running. I stopped _5 midway.

2022/06/07 - running _5

_5 obtained the following error message:

Warning messages:
1: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 3: sensor = "SPS-31N1". 
2: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 3: sensor = "SPS-31N1". 
3: Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1".

2w 8w 2w

2022/06/07 - initial script + _1, _2, _3 and _4 running

2022/06/07 - initial finished without issues + _5 running

_1 gave the following error:

Warning message:
Problem with `mutate()` column `regression`.
i `regression = map(...)`.
i 'rlm' failed to converge in 20 steps
i The warning occurred in group 5: sensor = "SPS-37N1". 

_2, _3, _4, _5, _6 finished running with no issues.

4w 8w

2022/06/07 - launch initial + _1

2022/06/08 - initial finished running with no errors + _2 to _5 launched

All finished to run without error.

2w16w2w

All launched.

initial and _1 finished without errors.

References

Johnston, Steven J., Philip J. Basford, Florentin M. J. Bulot, Mihaela Apetroaie-Cristea, Natasha H. C. Easton, Charlie Davenport, Gavin L. Foster, Matthew Loxham, Andrew K. R. Morris, and Simon J. Cox. 2019. “City Scale Particulate Matter Monitoring Using LoRaWAN Based Air Quality IoT Devices.” Sensors 19 (1): 209. https://doi.org/10.3390/s19010209.